X chromosome-wide association study of quantitative biomarkers from the Alzheimer’s Disease Neuroimaging Initiative study

Introduction Alzheimer’s disease (AD) is a complex neurodegenerative disease with high heritability. Compared to autosomes, a higher proportion of disorder-associated genes on X chromosome are expressed in the brain. However, only a few studies focused on the identification of the susceptibility loci for AD on X chromosome. Methods Using the data from the Alzheimer’s Disease Neuroimaging Initiative Study, we conducted an X chromosome-wide association study between 16 AD quantitative biomarkers and 19,692 single nucleotide polymorphisms (SNPs) based on both the cross-sectional and longitudinal studies. Results We identified 15 SNPs statistically significantly associated with different quantitative biomarkers of the AD. For the cross-sectional study, six SNPs (rs5927116, rs4596772, rs5929538, rs2213488, rs5920524, and rs5945306) are located in or near to six genes DMD, TBX22, LOC101928437, TENM1, SPANXN1, and ZFP92, which have been reported to be associated with schizophrenia or neuropsychiatric diseases in literature. For the longitudinal study, four SNPs (rs4829868, rs5931111, rs6540385, and rs763320) are included in or near to two genes RAC1P4 and AFF2, which have been demonstrated to be associated with brain development or intellectual disability in literature, while the functional annotations of other five novel SNPs (rs12157031, rs428303, rs5953487, rs10284107, and rs5955016) have not been found. Discussion 15 SNPs were found statistically significantly associated with the quantitative biomarkers of the AD. Follow-up study in molecular genetics is needed to verify whether they are indeed related to AD. The findings in this article expand our understanding of the role of the X chromosome in exploring disease susceptibility, introduce new insights into the molecular genetics behind the AD, and may provide a mechanistic clue to further AD-related studies.

the variances of the original or transformed QB across genotypes are equal (i.e.,  =  =  ,  =  and no restrictions on the means) and H : both the means and the variances of the original or transformed QB across genotypes are equal (i.e.,  =  =  ,  =  ,  =  =  and  =  ).

Methods testing for means
QXcat and QZ max are two sex-stratified X-chromosomal association tests for the means of the original or transformed QB.For females, the model can be written as  =  +   ( ) +   ( ) +   +  ,  = 1, 2, … ,  , (S1) where  is the intercept;  ( ) =  { } and  ( ) =  { } are two indicator variables with respect to  , and the corresponding regression coefficients respectively are  and  ;  denotes a vector of the covariates (i.e., age, education level, APOE4 allelic dosage, the top 10 principal components, and the batch) for female  with the corresponding regression coefficients being  ;  is a random error which follows (0,  ), (0,  ) and (0,  ) for genotypes ,  and , respectively.For males, we use the following model to test for the association between the original or transformed QB and the SNP  =  +   +   +  ,  = 1, 2, … ,  , (S2) where  is the intercept and  is the regression coefficient of  ;  is a vector of the covariates (i.e., age, education level, APOE4 allelic dosage, the top 10 principal components, and the batch) for male  with the corresponding regression coefficients being  ;  is a random error which follows (0,  ) and (0,  ) for genotypes  and , respectively.Since some factors (such as genotype-by-environment interactions (Wang et al., 2019a) and XCI (Deng et al., 2019)) may lead to unequal variances of the original or transformed QB across different genotypes, the parameters of models (S1) and (S2) can be estimated by the weighted least square method.Through jointly testing H :  =  =  = 0, we can get the -values of QXcat and QZ max by considering various XCI patterns (Wang et al., 2014;Yu et al., 2022) and different dosage compensation patterns (Wang et al., 2019b), respectively.The main difference between QXcat and QZ max is the computing process of the -value and the details please refer to Yang et al. (2022).
For T , which is the weighted version of T (Özbek et al., 2018), the model can be written as  =  +   +   +    +   +  ,  = 1, 2, … , , (S3) where  is the intercept;  is the sex of subject  with the regression coefficient being  , where females are coded as 0 and males are coded as 1;  and  are the regression coefficients of  and the interaction term   , respectively;  is a vector of the covariates without sex (i.e., age, education level, APOE4 allelic dosage, the top 10 principal components, and the batch) for subject  with the corresponding regression coefficients being ;  is a random error.Then, testing for the differences of the means of the original or transformed QB is achieved by testing H :  =  = 0 via the standard regression  test, where the model is fitted using the weighted least square method.Finally, T is obtained by adding a variable indicative of heterozygous females (Chen et al., 2021) in model (S3), i.e.,  =  +   +   +   +    +   +  ,  = 1, 2, … , , (S4) where  =  { } is indicative of whether or not subject  being a heterozygous female, and  is its regression coefficient.Through jointly testing H :  =  =  = 0, we can check whether the SNP has the effects on the mean values of the original or transformed QB.

Methods testing for variances
wM3VNA3.3 is a two-stage method to test the effect of the SNP on phenotypic variances (Deng et al., 2019).In stage 1, the original or transformed QB is adjusted for a shift in location (mean or median) using the model below where the estimated residuals (denoted by ̂ 's) can be calculated via the least absolute deviations method.In stage 2, the response ( = |̂ |) is the absolute residual obtained from stage 1.Then, the model can be written as where  ( ) =  { } and  ( ) =  { } are two indicator variables, respectively, for the  +  and  groups, and  and  are the corresponding regression coefficients;  is the regression coefficient of the interaction term  ( )  , and  is a random error.As such, testing for variance heterogeneity is achieved by testing H :  =  =  = 0 via the standard regression  test, where the model is fitted using the ordinary least square method.

Methods simultaneously testing for means and variances
When both the means and the variances of the original or transformed QB across different genotypes are different, a more powerful approach to identify the susceptibility loci is to jointly test for the mean and variance effects.Based on this, Yang et al. (2022) further proposed two meanvariance-based tests QMVX cat , by combining  wM3VNA3.3 with  QXcat , and QMVZ max , by combining  wM3VNA3.3 with  Q max , based on Fisher's method (Fisher et al., 1967) .Under H , both QMVX cat and QMVZ max asymptotically follow a chi-square distribution with the degrees of freedom being 4 (Chen et al., 2017).
2. Supplementary results.Inferring the direction of the SNP effect on the QB in cross-sectional XWAS After identifying the statistically significantly associated SNPs in the cross-sectional XWAS, another important issue is to check the direction of the SNP effect on the original or transformed QB.However, due to the complexity of the cross-sectional XWAS caused by taking account of various XCI or dosage compensation patterns, it seems difficult to directly determine the effect size and the effect direction.Fortunately, by observing the signs of the regression coefficients in the mean-based tests (QXcat and QZ max ) and those in the variance-based test (wM3VNA3.3),we can infer the direction of the SNP effect on the means and the variances, respectively.Specifically, for the mean-based tests (QXcat and QZ max ), we can examine the signs of the estimates  ,  and  of the regression coefficients  ,  and  , as presented in models (S1) and (S2) of Supplementary Methods.In females,  can be regarded as the mean effect of genotype  relative to genotype , and the mean effect of genotype  compared to  can be denoted by  +  .In males,  can be treated as the mean effect of genotype  relative to genotype .For the other two mean-based tests (T and T ), we can also check the direction of the SNP effect on the means of the original or transformed QB by respectively observing the signs of the regression coefficients in models (S4) and (S3) of Supplementary Methods, and the drawn conclusions are almost consistent with those from QXcat and QZ max .So, we will not repeat them here for brevity.On the other hand, for the variance-based test (wM3VNA3.3),we can get the estimates  ,  and  of the regression coefficients  ,  and  in stage 2, as shown in model (S6) of Supplementary Methods. and  represent the effects of genotypes  and  on the variances of the original or transformed QB compared to  in females, respectively, and  +  can be regarded as the effect of genotype  on the variances of the original or transformed QB relative to genotype  in males.
Supplementary Table S6 displays the point estimates and the 95% confidence intervals (CIs) of the regression coefficients  ,  , and  in QXcat and QZ max for six statistically significantly associated SNPs found in the cross-sectional XWAS, and Supplementary Table S7 shows the point estimates and the 95% CIs of the regression coefficients  ,  , and  in stage 2 of wM3VNA3.3 for these SNPs.SNP rs5927116 is in the DMD gene, which has the effects on the mean values of the FS Entorhinal ( = 1.74 × 10 ).Note that  = 383.656and  = 103.977 in QXcat and QZ max are bigger than 0, with the corresponding CIs being 202.911 ~ 564.401 and 28.151 ~ 179.804, respectively.So, for SNP rs5927116, the mean FS Entorhinal for genotype TC is higher than that for genotype CC in females and the mean FS Entorhinal for genotype T is larger than that for genotype C in males.Meanwhile,  = 20.785 and the 95% CI is -139.401~ 180.971, while  +  is still higher than 0. As such, the mean FS Entorhinal for genotype TT is larger than that for genotype CC and is not significantly different from that for genotype TC in females.This means that the minor allele T at rs5927116 is a causal allele, which will increase the mean FS Entorhinal.SNP rs4596772 only influences the mean values of the FS MidTemp ( = 9.94 × 10 and  = 7.55 × 10 ).The point estimates (95% CIs) of the regression coefficients  ,  , and  in QXcat and QZ max are -1877.788(-2806.175 ~ -949.400), -259.028 (-1006.061 ~ 488.004) and -464.309 (-790.651 ~ -137.966),respectively, suggesting that the direction of its effect on the FS MidTemp is just opposite to that of SNP rs5927116 on the FS Entorhinal.Thus, the minor allele A at rs4596772 is a causal allele, which will decrease the mean FS MidTemp.SNP rs5929538 is included in the LOC101928437 gene and has the effects on the mean values of the transformed FDG PET ( = 2.28 × 10 ).Note that  = 0.223 (95% CI: -0.200 ~ 0.646) and  = −1.175< 0 (95% CI: -1.616 ~ -0.735) in QXcat and QZ max , so females with genotype AA at rs5929538 have less FDG PET than those with genotype GG, while the mean effect of genotype AG compared to genotype GG in females (because the 95% CI of  contains 0) and the mean effect of genotype A relative to genotype G in males ( = 0.103; 95% CI: -0.032 ~ 0.239) are not statistically significant.Moreover, for SNP rs2213488, which is located in the TENM1 gene, only the -values of QMVX cat and QMVZ for simultaneously testing for the means and the variances of the FS Hippocampus are lower than the significance level 2.54 × 10 , where the corresponding -values are 1.30 × 10 and 7.23 × 10 , respectively.This suggests that either the means or the variances of the FS Hippocampus across different genotypes are different, which needs to be further investigated for larger sample size.SNP rs5920524 has the effects on the mean values of the transformed FDG PET ( = 5.57 × 10 and  = 5.97 × 10 ), and the resulting -value of QMVX cat is 1.72 × 10 .Note that the point estimate (95% CI) of  at rs5920524 is -0.074 (-0.346 ~ 0.199), so the mean effect of genotype TC compared to genotype CC in females is not statistically significant.In addition, both  = 0.465 and  = 0.198 in QXcat and QZ max are larger than 0 and the corresponding CIs 0.223 ~ 0.707 and 0.101 ~ 0.295 do not contain 0, so females with genotype TT at rs5920524 tend to have greater mean of the transformed FDG PET compared to females with genotype CC, and the mean value of the transformed FDG PET for genotype T is greater than that for genotype C in males.This implies that the major allele C at rs5920524 is a risk allele and will decrease the FDG PET.Finally, SNP rs5945306 only influences the mean values of the transformed FAQ ( = 7.67 × 10 and  = 9.22 × 10 ), and the resulting value of QMVX cat is 1.82 × 10 .Meanwhile, the point estimates (95% CIs) of the regression coefficients  ,  , and  in QXcat and QZ max are -0.639(-0.899 ~ -0.380), -0.053 (-0.216 ~ 0.109) and -0.004 (-0.093 ~ 0.084), respectively, indicating that the mean values of the transformed FAQ for genotypes CC and CT are lower than that for genotype TT in females, and the mean effect of genotype C compared to genotype T in males is not statistically significant.Therefore, the major allele T at rs5945306 in females is a risk allele and tends to increase the FAQ.Supplementary Figure S2.Q-Q plots for 15 QBs at the baseline in ADNI cohort 1 ( = 741).

Supplementary Tables
Supplementary Table S1.Sample size and number of used SNPs in ADNI cohorts 1, GO/2, and 1/GO/2 for cross-sectional XWAS.S13.Estimates of time×SNP interaction effects in longitudinal XWAS only based on non-Hispanic White subjects for nine SNPs, which were identified to be statistically significant in longitudinal XWAS based on the cross-ethnic sample.

Table S6 .
Point estimates and 95% confidence intervals of regression coefficients  ,  , and  in QXcat and QZ max for six statistically significantly associated SNPs found in crosssectional XWAS.FDG PET and FAQ are transformed using the rank-based inverse normal transformation.b SNP rs2213488 is an overlapping variant in both ADNI cohorts 1 and GO/2.However, it demonstrates the statistical significance only in the analysis of ADNI cohort GO/2 with 792 subjects, while it is not statistically significant in the analysis of ADNI cohort a

Table S7 .
Point estimates and 95% confidence intervals of regression coefficients  ,  , and  in stage 2 of wM3VNA3.3 for six statistically significantly associated SNPs found in cross-sectional XWAS.FDG PET and FAQ are transformed using the rank-based inverse normal transformation.b SNP rs2213488 is an overlapping variant in both ADNI cohorts 1 and GO/2.However, it demonstrates the statistical significance only in the analysis of ADNI cohort GO/2 with 792 subjects, while it is not statistically significant in the analysis of ADNI cohort a

Table S9 .
Estimates of SNP main effects in longitudinal XWAS for six SNPs found by cross-sectional XWAS.FDG PET and FAQ are transformed using the rank-based inverse normal transformation. a

Table S11 .
-values of all the methods in cross-sectional XWAS for nine SNPs found by longitudinal XWAS.

Table S12 .
-values of cross-sectional XWAS only based on non-Hispanic White subjects for six SNPs, which were identified to be statistically significant in cross-sectional XWAS based on the cross-ethnic sample.a The -values less than the significance level of 2.54 × 10 are highlighted in bold.b FDG PET and FAQ are transformed using the rank-based inverse normal transformation.
FS Ventricles is transformed using the rank-based inverse normal transformation.b The -values less than the significance level of 2.54 × 10 are highlighted in bold. a